Skip to content

Wave 3 performance pass across diffusion, hodge, spectral, and DEC#19

Merged
Autoparallel merged 22 commits into
mainfrom
perf/improvements-2
Mar 7, 2026
Merged

Wave 3 performance pass across diffusion, hodge, spectral, and DEC#19
Autoparallel merged 22 commits into
mainfrom
perf/improvements-2

Conversation

@Autoparallel
Copy link
Copy Markdown
Member

@Autoparallel Autoparallel commented Mar 7, 2026

Summary

This PR publishes the Wave 3 performance pass across diffusion, diffusion-geometry, spectral, Hodge, and DEC paths.

What changed

  • added bench_diffgeo, Wave 3 perf note scaffolding, and IGNEOUS_PERF_NOTES_DIR support for the perf scripts
  • reused 1-form operators inside the 2-form diffusion-geometry path and rewrote the generic k-form weak derivative around denser block assembly
  • parallelized the generic form up-Laplacian assembly and removed an inner tiny solve with closed-form adjugate arithmetic
  • cached the normalized kernel operator for repeated eigensolve matvecs
  • fused and parallelized post-kNN diffusion-build passes while preserving the accepted sparse build path
  • reused gamma assembly in HodgeWorkspace
  • replaced circular workspace preparation with exact matrix formulas for Gamma(x, phi) and the scalar weak Laplacian
  • matrixized Hodge curl-energy assembly over cached Markov and gamma data
  • added a dense generalized spectrum fast path with Cholesky whitening and exact fallback
  • parallelized stable DEC face-incidence construction while preserving per-vertex face order
  • recorded kept and rejected experiments in notes/perf/20260306-wave3/

CI smoke impact vs main

The PR smoke workflow on GitHub compares this branch against main on GitHub runners with parallel backend and 4 threads. Those are the right numbers to use for branch-vs-main review.

Headline wins from the current smoke report:

  • bench_pipeline_hodge_main: 1.763 s -> 372.939 ms (-78.85%)
  • bench_hodge_phase_circular: 1.082 s -> 13.206 ms (-98.78%)
  • bench_hodge_phase_curl_energy: 294.807 ms -> 39.683 ms (-86.54%)
  • bench_hodge_phase_weak_derivative: 27.624 ms -> 4.518 ms (-83.64%)
  • bench_pipeline_diffusion_main/20: 67.129 ms -> 27.147 ms (-59.56%)
  • bench_pipeline_diffusion_main/100: 70.464 ms -> 30.595 ms (-56.58%)
  • bench_pipeline_spectral_main: 110.190 ms -> 50.714 ms (-53.98%)
  • bench_diffusion_build/2000: 54.226 ms -> 20.929 ms (-61.40%)
  • bench_weak_derivative/2000/16: 2.727 ms -> 121.210 us (-95.55%)
  • bench_curl_energy/2000/16: 10.989 ms -> 1.613 ms (-85.32%)
  • bench_eigenbasis/2000/16: 23.269 ms -> 14.480 ms (-37.77%)
  • bench_geometry_structure_ms/1000x1000: 89.510 ms -> 74.975 ms (-16.24%)

Local campaign deltas

Some larger numbers in the Wave 3 notes are measured against the campaign's earlier local Release baselines rather than directly against main. Those numbers capture the cumulative effect of the full optimization campaign on the local perf machine.

Examples from the kept Wave 3 artifacts:

  • bench_diffgeo_pipeline/4000/64/32/32: 4169.532 ms -> 145.730 ms
  • bench_pipeline_hodge_main: 3275.117 ms -> 149.293 ms
  • bench_pipeline_spectral_main: 216.192 ms -> 16.783 ms
  • igneous-diffusion-geometry --n-points 1000 --output-dir <tempdir>: real 1.40 -> 0.07

Validation

  • ctest --test-dir build --output-on-failure -j8
  • 14/14 passing
  • targeted diffgeo, hodge, spectral, and DEC tests were run for each kept pass and logged in notes/perf/20260306-wave3/journal.md

Notes

  • the detailed campaign log, artifacts, and rejected experiments are in notes/perf/20260306-wave3/
  • the main remaining shared hotspot is still the large-basis diffusion eigenbasis solve

@github-actions
Copy link
Copy Markdown

github-actions Bot commented Mar 7, 2026

Perf Smoke Report

Baseline ref: main
Baseline: 8a670b7b8bdd3f5126b94c8c1028f302b36a0b51
Backend: parallel
Threads: 4

PR smoke: bench_geometry (baseline 8a670b7)

  • Baseline source: artifacts/perf/base/bench_geometry_base.txt
  • Current source: artifacts/perf/head/bench_geometry_head.txt
  • Baseline commit: 8a670b7b8bdd3f5126b94c8c1028f302b36a0b51
  • Comparable benchmarks: 16/16
  • Improved: 10 | Regressed: 6

Top Wins

Benchmark Baseline Current Delta
bench_geometry_structure_ms/1000x1000 83.480 ms 74.174 ms -11.15%
bench_geometry_structure_ms/250x250 5.330 ms 4.744 ms -10.99%
bench_geometry_structure_ms/500x500 21.247 ms 19.055 ms -10.32%
bench_geometry_flow_ms/500x500 2.387 ms 2.203 ms -7.71%
bench_geometry_structure_ms/100x100 522.000 us 494.000 us -5.36%

Top Regressions

Benchmark Baseline Current Delta
bench_geometry_flow_ms/100x100 68.000 us 71.000 us +4.41%
bench_geometry_curvature_ms/250x250 5.447 ms 5.543 ms +1.76%
bench_geometry_curvature_ms/500x500 22.010 ms 22.351 ms +1.55%
bench_geometry_flow_ms/250x250 440.000 us 446.000 us +1.36%
bench_geometry_curvature_ms/1000x1000 87.042 ms 87.945 ms +1.04%

Full Comparison

Benchmark Baseline Current Delta Status
bench_geometry_curvature_ms/1000x1000 87.042 ms 87.945 ms +1.04% regressed
bench_geometry_curvature_ms/100x100 911.000 us 920.000 us +0.99% regressed
bench_geometry_curvature_ms/250x250 5.447 ms 5.543 ms +1.76% regressed
bench_geometry_curvature_ms/500x500 22.010 ms 22.351 ms +1.55% regressed
bench_geometry_flow_ms/1000x1000 8.914 ms 8.618 ms -3.32% improved
bench_geometry_flow_ms/100x100 68.000 us 71.000 us +4.41% regressed
bench_geometry_flow_ms/250x250 440.000 us 446.000 us +1.36% regressed
bench_geometry_flow_ms/500x500 2.387 ms 2.203 ms -7.71% improved
bench_geometry_frame_ms/1000x1000 179.436 ms 170.737 ms -4.85% improved
bench_geometry_frame_ms/100x100 1.501 ms 1.485 ms -1.07% improved
bench_geometry_frame_ms/250x250 11.217 ms 10.733 ms -4.31% improved
bench_geometry_frame_ms/500x500 45.644 ms 43.609 ms -4.46% improved
bench_geometry_structure_ms/1000x1000 83.480 ms 74.174 ms -11.15% improved
bench_geometry_structure_ms/100x100 522.000 us 494.000 us -5.36% improved
bench_geometry_structure_ms/250x250 5.330 ms 4.744 ms -10.99% improved
bench_geometry_structure_ms/500x500 21.247 ms 19.055 ms -10.32% improved

PR smoke: bench_dod (baseline 8a670b7)

  • Baseline source: artifacts/perf/base/bench_dod_base.json
  • Current source: artifacts/perf/head/bench_dod_head.json
  • Baseline commit: 8a670b7b8bdd3f5126b94c8c1028f302b36a0b51
  • Comparable benchmarks: 11/11
  • Improved: 7 | Regressed: 4

Top Wins

Benchmark Baseline Current Delta
bench_weak_derivative/2000/16 2.725 ms 122.224 us -95.52%
bench_curl_energy/2000/16 10.989 ms 1.649 ms -84.99%
bench_diffusion_build/2000 54.223 ms 20.945 ms -61.37%
bench_hodge_solve/2000/16 234.419 us 139.800 us -40.36%
bench_eigenbasis/2000/16 23.344 ms 14.639 ms -37.29%

Top Regressions

Benchmark Baseline Current Delta
bench_markov_multi_step/2000/20 676.423 us 704.014 us +4.08%
bench_markov_step/2000 33.622 us 34.803 us +3.51%
bench_markov_multi_step/20000/20 7.529 ms 7.615 ms +1.15%
bench_1form_gram/2000/16 884.332 us 888.930 us +0.52%

Full Comparison

Benchmark Baseline Current Delta Status
bench_1form_gram/2000/16 884.332 us 888.930 us +0.52% regressed
bench_curl_energy/2000/16 10.989 ms 1.649 ms -84.99% improved
bench_curvature_kernel/400 13.869 ms 12.653 ms -8.76% improved
bench_diffusion_build/2000 54.223 ms 20.945 ms -61.37% improved
bench_eigenbasis/2000/16 23.344 ms 14.639 ms -37.29% improved
bench_flow_kernel/400 2.201 ms 2.163 ms -1.75% improved
bench_hodge_solve/2000/16 234.419 us 139.800 us -40.36% improved
bench_markov_multi_step/2000/20 676.423 us 704.014 us +4.08% regressed
bench_markov_multi_step/20000/20 7.529 ms 7.615 ms +1.15% regressed
bench_markov_step/2000 33.622 us 34.803 us +3.51% regressed
bench_weak_derivative/2000/16 2.725 ms 122.224 us -95.52% improved

PR smoke: bench_pipelines (baseline 8a670b7)

  • Baseline source: artifacts/perf/base/bench_pipelines_base.json
  • Current source: artifacts/perf/head/bench_pipelines_head.json
  • Baseline commit: 8a670b7b8bdd3f5126b94c8c1028f302b36a0b51
  • Comparable benchmarks: 10/10
  • Improved: 8 | Regressed: 2

Top Wins

Benchmark Baseline Current Delta
bench_hodge_phase_circular 1.076 s 12.170 ms -98.87%
bench_hodge_phase_curl_energy 295.217 ms 39.477 ms -86.63%
bench_hodge_phase_weak_derivative 27.666 ms 4.383 ms -84.16%
bench_pipeline_hodge_main 1.764 s 374.429 ms -78.78%
bench_pipeline_diffusion_main/20 67.314 ms 27.292 ms -59.46%

Top Regressions

Benchmark Baseline Current Delta
bench_hodge_phase_eigenbasis 236.048 ms 239.129 ms +1.31%
bench_hodge_phase_gram 5.371 ms 5.400 ms +0.54%

Full Comparison

Benchmark Baseline Current Delta Status
bench_hodge_phase_circular 1.076 s 12.170 ms -98.87% improved
bench_hodge_phase_curl_energy 295.217 ms 39.477 ms -86.63% improved
bench_hodge_phase_eigenbasis 236.048 ms 239.129 ms +1.31% regressed
bench_hodge_phase_gram 5.371 ms 5.400 ms +0.54% regressed
bench_hodge_phase_solve 7.496 ms 4.871 ms -35.02% improved
bench_hodge_phase_weak_derivative 27.666 ms 4.383 ms -84.16% improved
bench_pipeline_diffusion_main/100 70.671 ms 30.730 ms -56.52% improved
bench_pipeline_diffusion_main/20 67.314 ms 27.292 ms -59.46% improved
bench_pipeline_hodge_main 1.764 s 374.429 ms -78.78% improved
bench_pipeline_spectral_main 110.030 ms 50.954 ms -53.69% improved

@Autoparallel Autoparallel merged commit e0e8e9f into main Mar 7, 2026
15 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant